Code
library(nnet)
library(car)
library(tidyverse)
library(emmeans)
library(ggeffects)
library(knitr)
library(patchwork)
library(broom)
library(parameters)
library(easystats)Jason Geller
March 24, 2024
The data for this assignment comes from an online Ipsos survey that was conducted for the FiveThirtyEight article “Why Many Americans Don’t Vote”. You can read more about the survey design and respondents in the README of the GitHub repo for the data.
Respondents were asked a variety of questions about their political beliefs, thoughts on multiple issues, and voting behavior. We will focus on using the demographic variables and someone’s party identification to understand whether a person is a probable voter.
The variables we’ll focus on were (definitions from the codebook in data set GitHub repo):
ppage: Age of respondent
educ: Highest educational attainment category.
race: Race of respondent, census categories. Note: all categories except Hispanic were non-Hispanic.
gender: Gender of respondent
income_cat: Household income category of respondent
Q30: Response to the question “Generally speaking, do you think of yourself as a…”
voter_category: past voting behavior:
You can read in the data directly from the GitHub repo:
The variable Q30 contains the respondent’s political party identification. Make a new variable that simplifies Q30 into four categories: “Democrat”, “Republican”, “Independent”, “Other” (“Other” also includes respondents who did not answer the question).
The variable voter_category identifies the respondent’s past voter behavior. Relevel the variable to make rarely/never the baseline level, followed by sporadic, then always
In the FiveThirtyEight article, the authors include visualizations of the relationship between the voter category and demographic variables such as race, age, education, etc. Select two demographic variables. For each variable, try to replicate the visualizations and interpret the plot to describe its relationship with voter category. Have fun with it: https://www.mikelee.co/posts/2020-02-08-recreate-fivethirtyeight-chicklet-stacked-bar-chart-in-ggplot2.
# library
library(ggplot2)
library(viridis)
library(cowplot)
voter_data$race <- factor(voter_data$race, levels =c("Black", "Hispanic", "Other/Mixed", "White"))
p_race <- ggplot(data = voter_data, aes(x = fct_rev(race), fill = voter_category)) +
geom_bar(position = "fill") +
labs(x="Race", y="Percentage") +
theme(text = element_text(size = 16)) +
scale_x_discrete(limits = rev(levels("race")))+
scale_fill_viridis(discrete = TRUE) +
scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
coord_flip()
p_racevoter_data <- voter_data %>%
mutate(pol = fct_relevel(pol_ident_new,"Dem", "Rep", "Indep", "Other"))
p_id <- ggplot(voter_data, aes(x = fct_rev(pol), fill = voter_category)) +
geom_bar(position = "fill") +
labs(x="Political ID", y="Percentage") +
theme(text = element_text(size = 16)) +
scale_fill_viridis(discrete = TRUE) +
scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
coord_flip()
p_idFit a model using mean-centered age, race, gender, income, and education to predict voter category. Show the code used to fit the model, but do not display the model output.
> Yes. It should be included.
Use the model you select for the remainder of the assignment.
Run the full model and report overall significance of each of the terms
ppage, \(\chi^2(2)\) = 666.41, educ, \(\chi^2(4)\) = 252.81, p < .001, and pol \(\chi^2(6)\) = 171.91, p < .001
| y.level | term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|---|
| sporadic | (Intercept) | 1.5221680 | 0.0822932 | 18.4968772 | 0.0000000 |
| sporadic | ppage | 0.0457443 | 0.0023031 | 19.8621499 | 0.0000000 |
| sporadic | educHigh school or less | -1.0372010 | 0.0877549 | -11.8192889 | 0.0000000 |
| sporadic | educSome college | -0.3864438 | 0.0904127 | -4.2742199 | 0.0000192 |
| sporadic | polRep | -0.0388733 | 0.0964455 | -0.4030594 | 0.6869045 |
| sporadic | polIndep | -0.3802996 | 0.0941814 | -4.0379495 | 0.0000539 |
| sporadic | polOther | -0.9621902 | 0.1042842 | -9.2266187 | 0.0000000 |
| always | (Intercept) | 1.3124074 | 0.0867341 | 15.1313876 | 0.0000000 |
| always | ppage | 0.0590465 | 0.0025373 | 23.2711612 | 0.0000000 |
| always | educHigh school or less | -1.4771769 | 0.0984861 | -14.9988400 | 0.0000000 |
| always | educSome college | -0.4481505 | 0.0969629 | -4.6218753 | 0.0000038 |
| always | polRep | -0.0020977 | 0.1027836 | -0.0204086 | 0.9837174 |
| always | polIndep | -0.4879135 | 0.1025275 | -4.7588553 | 0.0000019 |
| always | polOther | -1.4043029 | 0.1287482 | -10.9073562 | 0.0000000 |
multi_an <- emmeans(mm_use, ~ pol|voter_category)
# uses baseline as contrast of interest
# can change this to get other baselines
# use trt.vs.ctrl" #ref = newbaseline
coefs = contrast(regrid(multi_an, "log"),"trt.vs.ctrl1", by="pol")
update(coefs, by = "contrast") %>%
kable(format = "markdown", digits = 3)| contrast | pol | estimate | SE | df | t.ratio | p.value |
|---|---|---|---|---|---|---|
| sporadic - (rarely/never) | Dem | 0.963 | 0.064 | 14 | 15.001 | 0.000 |
| always - (rarely/never) | Dem | 0.608 | 0.067 | 14 | 9.051 | 0.000 |
| sporadic - (rarely/never) | Rep | 0.925 | 0.071 | 14 | 13.089 | 0.000 |
| always - (rarely/never) | Rep | 0.606 | 0.074 | 14 | 8.233 | 0.000 |
| sporadic - (rarely/never) | Indep | 0.600 | 0.069 | 14 | 8.660 | 0.000 |
| always - (rarely/never) | Indep | 0.144 | 0.075 | 14 | 1.933 | 0.217 |
| sporadic - (rarely/never) | Other | 0.054 | 0.081 | 14 | 0.673 | 0.866 |
| always - (rarely/never) | Other | -0.723 | 0.105 | 14 | -6.873 | 0.000 |
For every political party listed, voting is more likely. That is, Democrats (sporadic - (rarely/never) 2.6195433, Republicans(sporadic - (rarely/never 2.5092904, and Independents (sporadic - (rarely/never1.8221188 were more likely to sporadically vote compared to rarely/never. Voters affiliated with a political party were also more likely to always vote compared to never/rarely vote (Democrats:1.8367542; Republicans: 1.8404314; Independents: 1.1502738 ). The exception here is Other. Others were more likely to rarely vote compared to always vote (0.4852942
multi_an <- emmeans(mm_use, ~ educ|voter_category)
# uses baseline as contrast of interest
# can change this to get other baselines
# use trt.vs.ctrl" #ref = newbaseline
coefs = contrast(regrid(multi_an, "log"),"trt.vs.ctrl1", by="educ")
update(coefs, by = "contrast") %>%
kable(format = "markdown", digits = 3)| contrast | educ | estimate | SE | df | t.ratio | p.value |
|---|---|---|---|---|---|---|
| sporadic - (rarely/never) | College | 1.101 | 0.068 | 14 | 16.299 | 0.000 |
| always - (rarely/never) | College | 0.781 | 0.070 | 14 | 11.193 | 0.000 |
| sporadic - (rarely/never) | High school or less | 0.112 | 0.057 | 14 | 1.974 | 0.167 |
| always - (rarely/never) | High school or less | -0.629 | 0.068 | 14 | -9.296 | 0.000 |
| sporadic - (rarely/never) | Some college | 0.728 | 0.067 | 14 | 10.811 | 0.000 |
| always - (rarely/never) | Some college | 0.352 | 0.071 | 14 | 4.951 | 0.001 |
College educated (OR = 3.004166), High school or less (OR = 1.1185129), and Some college (OR = 1.1185129) were more likely to sporadically vote compared to rarely/never. College educated voters (OR = 1.1185129), and some college (OR = 1.1185129) were always more likely to vote compared to rarely/never. High school or less (OR = 0.5331247) were less likely to always vote compared to rarely/never.
Next, plot the predicted probabilities of voter category as a function of Age and Party ID
::: {.cell}
ggemmeans(mm_use, terms = c("ppage")) %>% ggplot(., aes(x = x, y = predicted, fill = response.level)) +
geom_area() +
geom_rug(sides = "b", position = "jitter", alpha = .5) +
labs(x = "\nAge", y = "Predicted Probablity\n", title = "Predicted Probabilities of Voting Frequency by Age") +
scale_fill_manual(
name = NULL,
values = c("always" = "#F6B533", "sporadic" = "#D07EA2", "rarely/never" = "#9854F7"),
labels = c("RARELY OR NEVER VOTE ", "SOMETIMES VOTE ", "ALMOST ALWAYS VOTE "),
breaks = c("rarely/never", "sporadic", "always")
) +
theme_minimal()::: {.cell-output-display} ::: :::
::: {.cell}
ggemmeans(mm_use, terms = c("educ")) %>% ggplot(., aes(x = x, y = predicted,fill = response.level)) +
geom_bar(stat = "identity" ) +
geom_text(aes(label = round(predicted, 3)), color="white", position = position_fill(vjust = 0.5), size = 4) +
labs(x = "\nEducation", y = "Predicted Probablity\n", title = "Predicted Probabilities of Voting Frequency by Educational Attainment") +
scale_fill_manual(
name = NULL,
values = c("always" = "#F6B533", "sporadic" = "#D07EA2", "rarely/never" = "#9854F7"),
labels = c("RARELY OR NEVER VOTE ", "SOMETIMES VOTE ", "ALMOST ALWAYS VOTE "),
breaks = c("rarely/never", "sporadic", "always")
) +
theme_minimal()::: {.cell-output-display} ::: :::
::: {.cell}
ggemmeans(mm_use, terms = c("pol")) %>% ggplot(., aes(x = x, y = predicted,fill = response.level)) +
geom_bar(stat = "identity" ) +
geom_text(aes(label = round(predicted, 3)), color="white", position = position_fill(vjust = 0.5), size = 4) +
labs(x = "\nEducation", y = "Predicted Probablity\n", title = "Predicted Probabilities of Voting Frequency by Party Identification") +
scale_fill_manual(
name = NULL,
values = c("always" = "#F6B533", "sporadic" = "#D07EA2", "rarely/never" = "#9854F7"),
labels = c("RARELY OR NEVER VOTE ", "SOMETIMES VOTE ", "ALMOST ALWAYS VOTE "),
breaks = c("rarely/never", "sporadic", "always")
) +
theme_minimal()::: {.cell-output-display} ::: :::
## Write-up
multi_an <- emmeans(mm_use, ~ pol|voter_category)
# uses baseline as contrast of interest
# can change this to get other baselines
# use trt.vs.ctrl" #ref = newbaseline
coefs = contrast(regrid(multi_an, "log"),"trt.vs.ctrl1", by="pol")
update(coefs, by = "contrast") %>%
kable(format = "markdown", digits = 3)| contrast | pol | estimate | SE | df | t.ratio | p.value |
|---|---|---|---|---|---|---|
| sporadic - (rarely/never) | Dem | 0.963 | 0.064 | 14 | 15.001 | 0.000 |
| always - (rarely/never) | Dem | 0.608 | 0.067 | 14 | 9.051 | 0.000 |
| sporadic - (rarely/never) | Rep | 0.925 | 0.071 | 14 | 13.089 | 0.000 |
| always - (rarely/never) | Rep | 0.606 | 0.074 | 14 | 8.233 | 0.000 |
| sporadic - (rarely/never) | Indep | 0.600 | 0.069 | 14 | 8.660 | 0.000 |
| always - (rarely/never) | Indep | 0.144 | 0.075 | 14 | 1.933 | 0.217 |
| sporadic - (rarely/never) | Other | 0.054 | 0.081 | 14 | 0.673 | 0.866 |
| always - (rarely/never) | Other | -0.723 | 0.105 | 14 | -6.873 | 0.000 |
| contrast1 | contrast | estimate | SE | df | t.ratio | p.value |
|---|---|---|---|---|---|---|
| Rep - Dem | sporadic - (rarely/never) | -0.039 | 0.094 | 14 | -0.412 | 0.976 |
| Indep - Dem | sporadic - (rarely/never) | -0.363 | 0.091 | 14 | -3.974 | 0.007 |
| Indep - Rep | sporadic - (rarely/never) | -0.324 | 0.098 | 14 | -3.323 | 0.023 |
| Other - Dem | sporadic - (rarely/never) | -0.909 | 0.102 | 14 | -8.934 | 0.000 |
| Other - Rep | sporadic - (rarely/never) | -0.871 | 0.106 | 14 | -8.183 | 0.000 |
| Other - Indep | sporadic - (rarely/never) | -0.546 | 0.106 | 14 | -5.176 | 0.001 |
| Rep - Dem | always - (rarely/never) | -0.002 | 0.098 | 14 | -0.018 | 1.000 |
| Indep - Dem | always - (rarely/never) | -0.464 | 0.097 | 14 | -4.755 | 0.002 |
| Indep - Rep | always - (rarely/never) | -0.462 | 0.103 | 14 | -4.464 | 0.003 |
| Other - Dem | always - (rarely/never) | -1.331 | 0.124 | 14 | -10.725 | 0.000 |
| Other - Rep | always - (rarely/never) | -1.329 | 0.128 | 14 | -10.400 | 0.000 |
| Other - Indep | always - (rarely/never) | -0.867 | 0.129 | 14 | -6.739 | 0.000 |
multi_an <- emmeans(mm_use, ~ educ|voter_category)
# uses baseline as contrast of interest
# can change this to get other baselines
# use trt.vs.ctrl" #ref = newbaseline
coefs = contrast(regrid(multi_an, "log"),"trt.vs.ctrl1", by="educ")
update(coefs, by = "contrast") %>%
kable(format = "markdown", digits = 3)| contrast | educ | estimate | SE | df | t.ratio | p.value |
|---|---|---|---|---|---|---|
| sporadic - (rarely/never) | College | 1.101 | 0.068 | 14 | 16.299 | 0.000 |
| always - (rarely/never) | College | 0.781 | 0.070 | 14 | 11.193 | 0.000 |
| sporadic - (rarely/never) | High school or less | 0.112 | 0.057 | 14 | 1.974 | 0.167 |
| always - (rarely/never) | High school or less | -0.629 | 0.068 | 14 | -9.296 | 0.000 |
| sporadic - (rarely/never) | Some college | 0.728 | 0.067 | 14 | 10.811 | 0.000 |
| always - (rarely/never) | Some college | 0.352 | 0.071 | 14 | 4.951 | 0.001 |
| contrast1 | contrast | estimate | SE | df | t.ratio | p.value |
|---|---|---|---|---|---|---|
| High school or less - College | sporadic - (rarely/never) | -0.989 | 0.087 | 14 | -11.414 | 0.000 |
| Some college - College | sporadic - (rarely/never) | -0.372 | 0.088 | 14 | -4.212 | 0.002 |
| Some college - High school or less | sporadic - (rarely/never) | 0.616 | 0.087 | 14 | 7.086 | 0.000 |
| High school or less - College | always - (rarely/never) | -1.410 | 0.096 | 14 | -14.696 | 0.000 |
| Some college - College | always - (rarely/never) | -0.429 | 0.093 | 14 | -4.599 | 0.001 |
| Some college - High school or less | always - (rarely/never) | 0.981 | 0.097 | 14 | 10.150 | 0.000 |
A multinomial model was estimated using the
nnetpackage in R to investigate whether political party identification (Democrat, Independent, Republican, Other), education (high school or less, some college, college degree), and age (grand mean centered; M = 51.69) influence voting frequency (rarely or never vote, vote sporadically, almost always vote). All three predictors were significantly associated with voting frequency: party identification, \(\chi^2\) (6) = 171.91, p < .001; education, \(\chi^2\) (4) = 252.81, p < .001; and age, \(\chi^2\) (2) = 666.41, p < .001, \(R^2_{mcfadden}\) = .09. The odds of Independents (and those who support other parties or none) voting sporadically (versus rarely or never) were lower. Specifically, independents were times less likely compared to Democrats, lower compared to Republicans. Others were 0.4029269 times lower compared to Democrats, lower compared to Republicans. The odds that Republicans, relative to Democrats, voted sporadically was negligible. The pattern of results is similar when comparing the odds of always voting versus rarely or never voting. Supporters of all other parties in our data had lower odds of always voting compared to Democrats (Independents: OR = 0.6287636; Other: OR = 0.2644773) and Republicans (Independents: OR = 0.6300223; Other: OR = 0.2671353.
Those with high school and some college education were more likely to rarely vote compared to sporadically vote compared to college educated persons (High School: OR = 0.3719485, p < .001) and some college (OR = 0.6893542, p < .001) or always (High School: OR = 0.2441433, p < .001) and (some college: OR = 0.6511599, p < .001). Stated a bit differently, college voters were more likely to vote than those with a high school or some college education. We also see that those with some college education vs. High school or less were more likely to sporadically vote (Some College: OR = 1.8515072, p < .001) or always vote (Some College: OR = 2.667122, p < .001) compared to rarely/never vote.
For each one-year increase in age beyond 52 (the mean), the odds of voting sporadically (versus rarely or never) were 1.05 times higher (p < .001) relative to the baseline voter. They were 1.06 times higher for always voting (versus rarely or never).
---
title: "Lab 4 - Answers"
author:
- name: Jason Geller
date: last-modified
format:
html:
self-contained: true
anchor-sections: true
code-tools: true
code-fold: true
fig-width: 8
fig-height: 4
code-block-bg: "#f1f3f5"
code-block-border-left: "#31BAE9"
mainfont: Source Sans Pro
theme: journal
toc: true
toc-depth: 3
toc-location: left
captions: true
cap-location: margin
table-captions: true
tbl-cap-location: margin
reference-location: margin
pdf:
pdf-engine: lualatex
toc: false
number-sections: true
number-depth: 2
top-level-division: section
reference-location: document
listings: false
header-includes:
\usepackage{marginnote, here, relsize, needspace, setspace}
\def\it{\emph}
comments:
hypothesis: false
execute:
warning: false
message: false
---
## Data
The data for this assignment comes from an online Ipsos survey that was conducted for the FiveThirtyEight article ["Why Many Americans Don't Vote"](https://projects.fivethirtyeight.com/non-voters-poll-2020-election/). You can read more about the survey design and respondents in the README of the [GitHub repo](https://github.com/fivethirtyeight/data/tree/master/non-voters) for the data.
Respondents were asked a variety of questions about their political beliefs, thoughts on multiple issues, and voting behavior. We will focus on using the demographic variables and someone's party identification to understand whether a person is a probable voter.
The variables we'll focus on were (definitions from the codebook in data set GitHub repo):
- `ppage`: Age of respondent
- `educ`: Highest educational attainment category.\
- `race`: Race of respondent, census categories. Note: all categories except Hispanic were non-Hispanic.
- `gender`: Gender of respondent
- `income_cat`: Household income category of respondent
- `Q30`: Response to the question "Generally speaking, do you think of yourself as a..."
- 1: Republican
- 2: Democrat
- 3: Independent
- 4: Another party, please specify
- 5: No preference
- -1: No response
- `voter_category`: past voting behavior:
- **always**: respondent voted in all or all-but-one of the elections they were eligible in
- **sporadic**: respondent voted in at least two, but fewer than all-but-one of the elections they were eligible in
- **rarely/never**: respondent voted in 0 or 1 of the elections they were eligible in
You can read in the data directly from the GitHub repo:
```{r}
library(nnet)
library(car)
library(tidyverse)
library(emmeans)
library(ggeffects)
library(knitr)
library(patchwork)
library(broom)
library(parameters)
library(easystats)
```
```{r}
voter_data <- read_csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/non-voters/nonvoters_data.csv")
```
# Lab
- The variable `Q30` contains the respondent's political party identification. Make a new variable that simplifies `Q30` into four categories: "Democrat", "Republican", "Independent", "Other" ("Other" also includes respondents who did not answer the question).
```{r}
voter_data <- voter_data %>%
mutate(pol_ident_new = case_when(
Q30==1 ~ "Rep",
Q30==2 ~ "Dem",
Q30==3 ~ "Indep",
TRUE ~ "Other"
))
```
- The variable `voter_category` identifies the respondent's past voter behavior. Relevel the variable to make rarely/never the baseline level, followed by sporadic, then always
```{r}
voter_data$voter_category <- factor(voter_data$voter_category, levels =c("rarely/never", "sporadic", "always"))
```
```{r}
# center var
voter_data$ppage <- datawizard::center(voter_data$ppage)
```
- In the [FiveThirtyEight article](https://projects.fivethirtyeight.com/non-voters-poll-2020-election/), the authors include visualizations of the relationship between the voter category and demographic variables such as race, age, education, etc. Select two demographic variables. For each variable, try to replicate the visualizations and interpret the plot to describe its relationship with voter category. Have fun with it: https://www.mikelee.co/posts/2020-02-08-recreate-fivethirtyeight-chicklet-stacked-bar-chart-in-ggplot2.
```{r}
# library
library(ggplot2)
library(viridis)
library(cowplot)
voter_data$race <- factor(voter_data$race, levels =c("Black", "Hispanic", "Other/Mixed", "White"))
p_race <- ggplot(data = voter_data, aes(x = fct_rev(race), fill = voter_category)) +
geom_bar(position = "fill") +
labs(x="Race", y="Percentage") +
theme(text = element_text(size = 16)) +
scale_x_discrete(limits = rev(levels("race")))+
scale_fill_viridis(discrete = TRUE) +
scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
coord_flip()
p_race
```
```{r}
voter_data <- voter_data %>%
mutate(pol = fct_relevel(pol_ident_new,"Dem", "Rep", "Indep", "Other"))
p_id <- ggplot(voter_data, aes(x = fct_rev(pol), fill = voter_category)) +
geom_bar(position = "fill") +
labs(x="Political ID", y="Percentage") +
theme(text = element_text(size = 16)) +
scale_fill_viridis(discrete = TRUE) +
scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
coord_flip()
p_id
```
```{r}
library(patchwork)
p_id+ p_race + plot_layout(guides = "collect") + plot_annotation(title = 'Demographic information of survey repsondents, by voting history')
```
- Fit a model using mean-centered age, race, gender, income, and education to predict voter category. Show the code used to fit the model, but do **not** display the model output.
```{r}
#| results: hide
#|
library(nnet)
mm <- multinom(voter_category~ ppage + race + gender+ income_cat + educ, data=voter_data)
model_parameters(mm)
```
- *Should party identification be added to the model?*
```{r}
#| message: false
#| results: hide
mm_red<- multinom(voter_category~ ppage + race + gender+ income_cat + educ, data=voter_data)
mm_full <- multinom(voter_category~ ppage + race + gender+ income_cat + educ + pol, data=voter_data)
anova(mm_red, mm_full)
```
> Yes. It should be included.
**Use the model you select for the remainder of the assignment**.
## LRT
- Run the full model and report overall significance of each of the terms
```{r}
#| results: hide
mm_use <- multinom(voter_category~ ppage+ educ + pol, data=voter_data)
tidy(car::Anova(mm_use)) %>%
kable()
```
> ppage, $\chi^2(2)$ = 666.41, educ, $\chi^2(4)$ = 252.81, p \< .001, and pol $\chi^2(6)$ = 171.91, p \< .001
```{r}
mm_use %>%
tidy() %>%
kable()
```
e.
## Marginal Effects Political Group - Emmeans
```{r}
multi_an <- emmeans(mm_use, ~ pol|voter_category)
# uses baseline as contrast of interest
# can change this to get other baselines
# use trt.vs.ctrl" #ref = newbaseline
coefs = contrast(regrid(multi_an, "log"),"trt.vs.ctrl1", by="pol")
update(coefs, by = "contrast") %>%
kable(format = "markdown", digits = 3)
```
> For every political party listed, voting is more likely. That is, Democrats (sporadic - (rarely/never) `r exp(0.963)`, Republicans(sporadic - (rarely/never `r exp(0.92)`, and Independents (sporadic - (rarely/never`r exp(0.60)` were more likely to sporadically vote compared to rarely/never. Voters affiliated with a political party were also more likely to always vote compared to never/rarely vote (Democrats:`r exp(0.608)`; Republicans: `r exp(0.61)`; Independents: `r exp(0.14)` ). The exception here is Other. Others were more likely to rarely vote compared to always vote (`r exp(-.723)`
## Marginal Effects of Education - Emmeans
```{r}
multi_an <- emmeans(mm_use, ~ educ|voter_category)
# uses baseline as contrast of interest
# can change this to get other baselines
# use trt.vs.ctrl" #ref = newbaseline
coefs = contrast(regrid(multi_an, "log"),"trt.vs.ctrl1", by="educ")
update(coefs, by = "contrast") %>%
kable(format = "markdown", digits = 3)
```
> College educated (OR = `r exp(1.10)`), High school or less (OR = `r exp(.112)`), and Some college (OR = `r exp(.112)`) were more likely to sporadically vote compared to rarely/never. College educated voters (OR = `r exp(.112)`), and some college (OR = `r exp(.112)`) were always more likely to vote compared to rarely/never. High school or less (OR = `r exp(-.629)`) were less likely to always vote compared to rarely/never.
- Next, plot the predicted probabilities of voter category as a function of Age and Party ID
```{r}
ggemmeans(mm_use, terms = c("ppage")) %>% ggplot(., aes(x = x, y = predicted, fill = response.level)) +
geom_area() +
geom_rug(sides = "b", position = "jitter", alpha = .5) +
labs(x = "\nAge", y = "Predicted Probablity\n", title = "Predicted Probabilities of Voting Frequency by Age") +
scale_fill_manual(
name = NULL,
values = c("always" = "#F6B533", "sporadic" = "#D07EA2", "rarely/never" = "#9854F7"),
labels = c("RARELY OR NEVER VOTE ", "SOMETIMES VOTE ", "ALMOST ALWAYS VOTE "),
breaks = c("rarely/never", "sporadic", "always")
) +
theme_minimal()
```
```{r}
ggemmeans(mm_use, terms = c("educ")) %>% ggplot(., aes(x = x, y = predicted,fill = response.level)) +
geom_bar(stat = "identity" ) +
geom_text(aes(label = round(predicted, 3)), color="white", position = position_fill(vjust = 0.5), size = 4) +
labs(x = "\nEducation", y = "Predicted Probablity\n", title = "Predicted Probabilities of Voting Frequency by Educational Attainment") +
scale_fill_manual(
name = NULL,
values = c("always" = "#F6B533", "sporadic" = "#D07EA2", "rarely/never" = "#9854F7"),
labels = c("RARELY OR NEVER VOTE ", "SOMETIMES VOTE ", "ALMOST ALWAYS VOTE "),
breaks = c("rarely/never", "sporadic", "always")
) +
theme_minimal()
```
```{r}
ggemmeans(mm_use, terms = c("pol")) %>% ggplot(., aes(x = x, y = predicted,fill = response.level)) +
geom_bar(stat = "identity" ) +
geom_text(aes(label = round(predicted, 3)), color="white", position = position_fill(vjust = 0.5), size = 4) +
labs(x = "\nEducation", y = "Predicted Probablity\n", title = "Predicted Probabilities of Voting Frequency by Party Identification") +
scale_fill_manual(
name = NULL,
values = c("always" = "#F6B533", "sporadic" = "#D07EA2", "rarely/never" = "#9854F7"),
labels = c("RARELY OR NEVER VOTE ", "SOMETIMES VOTE ", "ALMOST ALWAYS VOTE "),
breaks = c("rarely/never", "sporadic", "always")
) +
theme_minimal()
```
## Write-up
### Differences between political groups and voting behavior - Emmeans
```{r}
multi_an <- emmeans(mm_use, ~ pol|voter_category)
# uses baseline as contrast of interest
# can change this to get other baselines
# use trt.vs.ctrl" #ref = newbaseline
coefs = contrast(regrid(multi_an, "log"),"trt.vs.ctrl1", by="pol")
update(coefs, by = "contrast") %>%
kable(format = "markdown", digits = 3)
# get difference between yes-no and fair-excellent
contrast(coefs, "revpairwise", by = "contrast") %>%
kable(format = "markdown", digits = 3)
```
### Differences between education level and voting behavior - Emmeans
```{r}
multi_an <- emmeans(mm_use, ~ educ|voter_category)
# uses baseline as contrast of interest
# can change this to get other baselines
# use trt.vs.ctrl" #ref = newbaseline
coefs = contrast(regrid(multi_an, "log"),"trt.vs.ctrl1", by="educ")
update(coefs, by = "contrast") %>%
kable(format = "markdown", digits = 3)
# get difference between yes-no and fair-excellent
contrast(coefs, "revpairwise", by = "contrast") %>%
kable(format = "markdown", digits = 3)
```
> A multinomial model was estimated using the `nnet` package in R to investigate whether political party identification (Democrat, Independent, Republican, Other), education (high school or less, some college, college degree), and age (grand mean centered; M = 51.69) influence voting frequency (rarely or never vote, vote sporadically, almost always vote). All three predictors were significantly associated with voting frequency: party identification, $\chi^2$ (6) = 171.91, p \< .001; education, $\chi^2$ (4) = 252.81, p \< .001; and age, $\chi^2$ (2) = 666.41, p \< .001, $R^2_{mcfadden}$ = .09. The odds of Independents (and those who support other parties or none) voting sporadically (versus rarely or never) were lower. Specifically, independents were `r OR = exp(-0.363)` times less likely compared to Democrats, `r OR = exp(-0.324 )` lower compared to Republicans. Others were `r exp(-0.909)` times lower compared to Democrats, `r OR = exp(-0.871)` lower compared to Republicans. The odds that Republicans, relative to Democrats, voted sporadically was negligible. The pattern of results is similar when comparing the odds of always voting versus rarely or never voting. Supporters of all other parties in our data had lower odds of always voting compared to Democrats (Independents: OR = `r exp(-0.464)`; Other: OR = `r exp(-1.33)`) and Republicans (Independents: OR = `r exp(-0.462)`; Other: OR = `r exp(-1.32)`.
> Those with high school and some college education were more likely to rarely vote compared to sporadically vote compared to college educated persons (High School: OR = `r exp(-0.989)`, p \< .001) and some college (OR = `r exp(-0.372)`, p \< .001) or always (High School: OR = `r exp(-1.41)`, p \< .001) and (some college: OR = `r exp(-0.429)`, *p* \< .001). Stated a bit differently, college voters were more likely to vote than those with a high school or some college education. We also see that those with some college education vs. High school or less were more likely to sporadically vote (Some College: OR = `r exp(0.616)`, *p* \< .001) or always vote (Some College: OR = `r exp(.981)`, p \< .001) compared to rarely/never vote.
> For each one-year increase in age beyond 52 (the mean), the odds of voting sporadically (versus rarely or never) were 1.05 times higher (p \< .001) relative to the baseline voter. They were 1.06 times higher for always voting (versus rarely or never).